I am trying to build the RegFS sample to better understand the Windows Projected File System. My code is building without a warning, but I am getting dynamic linking errors. Below is a sample error, with the code causing it right below.
"The procedure entry point PrjWritePlaceholderInfo could not be located in the dynamic link library."
HRESULT VirtualizationInstance::WritePlaceholderInfo(
LPCWSTR relativePath,
PRJ_PLACEHOLDER_INFO* placeholderInfo,
DWORD length
) {
return PrjWritePlaceholderInfo(
_instanceHandle,
relativePath,
placeholderInfo,
length);
}
I'm sure I did something wrong when I was linking. Under [Project Property Pages] > Linker > Input, I prepended "ProjectedFSlib.lib" to "Additional Dependencies."
This is my first time using Visual Studio with libraries not linked in by default, and I've been unable to find instructions on how to locate and link libraries within the Windows SDK.
Thanks for your help!
EDIT:
The DUMPBIN output is:
Dump of file ProjectedFSLib.lib
File Type: LIBRARY
Exports
ordinal name
PrjAllocateAlignedBuffer
PrjClearNegativePathCache
PrjCloseFile
PrjCommandCallbacksInit
PrjCompleteCommand
PrjConfigureVolume
PrjConvertDirectoryToPlaceholder
PrjCreatePlaceholderAsHardlink
PrjDeleteFile
PrjDetachDriver
PrjDoesNameContainWildCards
PrjFileNameCompare
PrjFileNameMatch
PrjFillDirEntryBuffer
PrjFreeAlignedBuffer
PrjGetOnDiskFileState
PrjGetVirtualizationInstanceIdFromHandle
PrjGetVirtualizationInstanceInfo
PrjMarkDirectoryAsPlaceholder
PrjOpenFile
PrjReadFile
PrjStartVirtualizationInstance
PrjStartVirtualizationInstanceEx
PrjStartVirtualizing
PrjStopVirtualizationInstance
PrjStopVirtualizing
PrjUpdateFileIfNeeded
PrjUpdatePlaceholderIfNeeded
PrjWriteFile
PrjWriteFileData
PrjWritePlaceholderInfo
PrjWritePlaceholderInformation
PrjpReadPrjReparsePointData
Summary
D8 .debug$S
14 .idata$2
14 .idata$3
8 .idata$4
8 .idata$5
14 .idata$6
A DUMPBIN of the executable imports results in:
Dump of file regfs.exe
File Type: EXECUTABLE IMAGE
Section contains the following imports:
PROJECTEDFSLIB.dll
14006D2A0 Import Address Table
14006D9E0 Import Name Table
0 time date stamp
0 Index of first forwarder reference
1E PrjWritePlaceholderInfo
1D PrjWriteFileData
19 PrjStopVirtualizing
17 PrjStartVirtualizing
C PrjFileNameMatch
D PrjFillDirEntryBuffer
E PrjFreeAlignedBuffer
0 PrjAllocateAlignedBuffer
11 PrjGetVirtualizationInstanceInfo
12 PrjMarkDirectoryAsPlaceholder
B PrjFileNameCompare
KERNEL32.dll
14006D098 Import Address Table
14006D7D8 Import Name Table
0 time date stamp
0 Index of first forwarder reference
389 IsProcessorFeaturePresent
382 IsDebuggerPresent
466 RaiseException
1B1 FreeLibrary
BA CreateDirectoryW
116 DeleteFileW
59A TerminateProcess
4BD RemoveDirectoryW
621 WriteFile
C2 CreateFile2
86 CloseHandle
267 GetLastError
3F2 MultiByteToWideChar
21D GetCurrentProcess
57B SetUnhandledExceptionFilter
5BC UnhandledExceptionFilter
4E1 RtlVirtualUnwind
4DA RtlLookupFunctionEntry
4D3 RtlCaptureContext
477 ReadFile
2B5 GetProcAddress
5DD VirtualQuery
2BB GetProcessHeap
60D WideCharToMultiByte
450 QueryPerformanceCounter
21E GetCurrentProcessId
2F0 GetSystemTimeAsFileTime
36C InitializeSListHead
352 HeapFree
34E HeapAlloc
27E GetModuleHandleW
2D7 GetStartupInfoW
222 GetCurrentThreadId
ADVAPI32.dll
14006D000 Import Address Table
14006D740 Import Name Table
0 time date stamp
0 Index of first forwarder reference
299 RegQueryValueExW
293 RegQueryInfoKeyW
28C RegOpenKeyExW
27D RegEnumValueW
27A RegEnumKeyExW
25B RegCloseKey
281 RegGetValueW
ole32.dll
14006D438 Import Address Table
14006DB78 Import Name Table
0 time date stamp
0 Index of first forwarder reference
2A CoCreateGuid
MSVCP140D.dll
14006D228 Import Address Table
14006D968 Import Name Table
0 time date stamp
0 Index of first forwarder reference
A5 ??1_Lockit#std##QEAA#XZ
6D ??0_Lockit#std##QEAA#H#Z
296 ?_Xlength_error#std##YAXPEBD#Z
297 ?_Xout_of_range#std##YAXPEBD#Z
VCRUNTIME140D.dll
14006D360 Import Address Table
14006DAA0 Import Name Table
0 time date stamp
0 Index of first forwarder reference
3C memcpy
3D memmove
1 _CxxThrowException
E __CxxFrameHandler3
36 _purecall
3B memcmp
21 __std_exception_copy
22 __std_exception_destroy
8 __C_specific_handler
9 __C_specific_handler_noexcept
25 __std_type_info_destroy_list
2E __vcrt_GetModuleFileNameW
2F __vcrt_GetModuleHandleW
31 __vcrt_LoadLibraryExW
ucrtbased.dll
14006D498 Import Address Table
14006DBD8 Import Name Table
0 time date stamp
0 Index of first forwarder reference
2B6 _register_thread_local_exe_atexit_callback
B5 _configthreadlocale
2CE _set_new_mode
4D __p__commode
11D _free_dbg
52C strcpy_s
528 strcat_s
68 __stdio_common_vsprintf_s
2C2 _seh_filter_dll
B6 _configure_narrow_argv
171 _initialize_narrow_environment
172 _initialize_onexit_table
9F _c_exit
E5 _execute_onexit_table
C2 _crt_atexit
C1 _crt_at_quick_exit
54B terminate
39C _wmakepath_s
3B8 _wsplitpath_s
564 wcscpy_s
A4 _cexit
48D getchar
60 __stdio_common_vfwprintf
35 __acrt_iob_func
4 _CrtDbgReport
567 wcslen
176 _invalid_parameter
4B __p___wargv
49 __p___argc
2CB _set_fmode
EA _exit
450 exit
175 _initterm_e
174 _initterm
13E _get_initial_wide_environment
173 _initialize_wide_environment
B7 _configure_wide_argv
5B __setusermatherr
2C6 _set_app_type
561 wcscmp
5 _CrtDbgReportW
4D8 malloc
2B5 _register_onexit_function
A1 _callnewh
2C3 _seh_filter_exe
Summary
1000 .00cfg
1000 .data
2000 .idata
1000 .msvcjmc
5000 .pdata
17000 .rdata
1000 .reloc
1000 .rsrc
37000 .text
18000 .textbss
As evident, it imports all the necessary functions from PROJECTEDFSLIB.dll
Either add ProjectedFSLib.lib to your libraries or add a:
#pragma comment(lib, "ProjectedFSLib.lib")
line in your code. Also, make sure you are using version 10.0.17763.0 of the SDK. If you are using mingw it would not surprise me if this library has not been made available yet.
The Projected FS is still an optional feature of Windows that requires manual installation to use. Go to Control Panel -> Programs and Features -> Turn Windows Features on or off. In that list of optional features, scroll down to "Windows Projected File System" and make sure that it is enabled there. Only after that is done will you have a ProjectedFSLib.dll show up in your system32 directory.
It's also probably worth noting that it looks like there's only an x64 version of this DLL, so if you're building an x86 program, that might be the reason why you're unable to dynamically link with that DLL.
I want to send hex encoded data to another client via sockets in python. I managed to do everything some time ago in python 2. Now I want to port it to python 3.
Data looks like this:
""" 16 03 02 """
Then I used this function to get it into a string:
x.replace(' ', '').replace('\n', '').decode('hex')
It then looks like this (which is a type str by the way):
'\x16\x03\x02'
Now I managed to find this in python 3:
codecs.decode('160302', 'hex')
but it returns another type:
b'\x16\x03\x02'
And since everything I encode is not a proper language, i cannot use utf-8 or some decoders, as there are invalid bytes in it (e.g. \x00, \xFF). Any ideas on how I can get the string solution escaped again just like in python 2?
Thanks
'str' objects in python 3 are not sequences of bytes but sequences of unicode code points.
If by "send data" you mean calling send then bytes is the right type to use.
If you really want the string (not 3 bytes but 12 unicode code points):
>>> import codecs
>>> s = str(codecs.decode('16ff00', 'hex'))[2:-1]
>>> s
'\\x16\\xff\\x00'
>>> print(s)
\x16\xff\x00
Note that you need to double backslashes in order to represent them in code.
There is an standard solution for Python2 and Python3. No imports needed:
hex_string = """ 16 03 02 """
some_bytes = bytearray.fromhex(hex_string)
In python3 you can treat it like an str (slicing it, iterate, etc) also you can add byte-strings: b'\x00', b'text' or bytes('text','utf8')
You also mentioned something about to encode "utf-8". So you can do it easily with:
some_bytes.encode()
As you can see you don't need to clean it. This function is very effective. If you want to return to hexadecimal string: some_bytes.hex() will do it for you.
a = """ 16 03 02 """.encode("utf-8")
#Send things over socket
print(a.decode("utf-8"))
Why not encoding with UTF-8, sending with socket and decoding with UTF-8 again ?
I'm trying to parse some text information from within project files of an Adobe program (Adobe Premiere Pro). The project files are gzip compressed XML files. These then contain fields which have base64 encoded fields which contain further information about a component of the project. I'm trying to look at the information within these fields.
I can't for the life of me identify the compression scheme used here. It doesn't look like GZip as it doesn't have the appropriate headers. Can anyone help?
An example of a base64 encoded field is:
AQAAAAAAAACk1QAAAAAAAENvbXByZXNzZWRUaXRsZQB4nO0da1MbSa5/iuu+L34Eg6maY4sQ2OWWhBRmk/hTytiGeM+xfbZJYH/83enRPf2cGfMIMyZdFOCR1Gp1t6TWSOPp//03Eb+KW/FVTERNfBMjsRBLMRYzMRX/FP8QTbElGvC/BpipGAB8CNipuCbsn+JCHItfgGqHaH4V+yIRB0AzE5fQ4rM4h08zsfLgF8BlBX2OCPPB6ndftKDPhujA7y78toCi7tEk4i1x5uvPogv4FV2jdEuieA899MUdYI+BaiG+w9UCpNgHuoW4gRbIOZsqAa4ruloBVvUzgOsRzce+uALsBHpjTkXUKNEinZcJ/OzD+FgGF56II+C1JOgJtWZKF4p0U5AWe7yCn3wJ82lRvhnQroIS+hhTxjOYz1VASgVHaPGK1XO0BMe7JHnHhF+IU1q3GfWwBP37kKm/2N6kdq+PaD5GwH1F/VwBPMSvJXmZ9BfAZU4yqrGHcCw/Ws/AWxUTg9dFkrnSv4G2X2kNlax5ss/FJ1jNE/h7BFy6AGuKPbI57NvHYosefD4R7wDG9GydTG/jkLorDkFLjuDnHaznObVgWh+TwNovQMbvMI4vNJKFHK2ydIU/B8g10OB48XpKPItau/Ppz5U7mweklaxleD2SutkFGlwJrZUH1FufsHh9JS1gS45XwxL4O4Pxm1gFSagvHhnj9whvQhOQZUZyzUB6k8qGsxfKk1NhD4xZyh5HQ7RzR6LweWNprzmadsF4siSur7V6r4FmIP4t9eWGLC5kJdvSSrogax9oEH9Oox9Rn7gLoj1vg9R7YAUsYT5tAl7Q98VnUvfmAEUfqDzhOVzjf2w5pR3pK7UzR8nWlN0O8aERI+YNSdgHyATa4p51l/rYopGGR1fP5WnjLuDqluxwlHrzuWWvTyNDcT+/AawPsC/kec9AX/6SvsPmxPozovk7pBbaqy/Ff4CqT7KgDGGqBDjPaJUU7BA4z0gTcE30+iCPfFqf1xn9nRby0XRo5YgZiY/wf0i+cz+1dB+TgAaOyZZwzx3LK9UijEvAJ49JlteA+xs+s6W5VpBFlRijOSQrmBLuGGgmpFu8Mq8gRtwCSWpP9Il91To9J6ArE9KqOa3Nc0u5fu9IfT9tV9ZzSJ4NVyhEZXontXf00zsEFQ+5UPb+t2QlU5IL929XL8I0qFUzis4/yvHYsbyPTch60A/iLJ1K7V5CfOP2mE2XxaO3Jo+evP+Z0B44lfEHWonaVzny9fEJ3VFcUuSM+4vpWT/KsfL+m0WVkE3OifeAbPOCaLu0fwzSVWnSPRfyWo8+i6/2eubYimjVOJW22leunrFuvNtw7TNbPkz3ijlEzXuc5oU17TKd7WrooLujRi18aVqYpW9LyXUquWye/sU9eDN00NW0+poxYv59WDhrp+7B89vWAneC3BrvFXdSHjy+Lv0dSKsZw73GiOLyHTnjGobzcCt+h886Y7ENkTfPiYth6j9ICp2ltWFob5wPVTPYpThc5VnY2rIp1rdr7Lcr7z9x/nfT/IwNdylD+YkQhWo3Ia+wssZrQpnuT7qHwlnQmuP3EaLi9uY9qLkC9r0pt2J7932BiUsAO4Y7TcxfD1PZbRjTnJD2s62YdDacaa9I3yfgm/vkNfYpOzUmuregf6qtT6faz2jWjgmOd253QV6aTx69zdP0Qmgt1yQD5itcbra/MjHaQ7S9Vqb3uKJZ6VMu4Jr84h21QRvrUCVnDzxcK+Xh07JX8e21XuAJHusrdqOviL4i+oroK9bwFZ3oK6KviL7ip/EVRTSqNqxG6V7XYG3fBO5LNMVraaF5d0IrY64a8oe1Vs/TnKpE41SyN1RDRT3tC1UnUzX7YkpdI+LKAXoDdUccsoUQXUJPjPAo2afup7PAvtHFJsQbJcA6PeJxDVeUAVDWkoXPatsraNuTknINF7Mcn+FqRj6ScwfLlEMRVUKrsSDMVFr0inRQzXwYez9vcwSrdUk+fUIj+ZJmK7pytzkne/T94n1a4tNAqsLMle0+ZRC0bbLmosUh7yJqpPG1HuHHpDnXqSyncn1CFtGWFuG2sfdff+/lJ56wR7272LBEcP18YnAwIQntzfwUlDuvJiYh6Y/IZ4zkHjRMnxFAzfFlt7ndvz3Wp12o+1xSHgXajo07o7lxJcuiSig7xNb1u9zzcOXc9llUPPdY51mR91b+dwfWGz19J10Nnyarb/OpjBM5j7cCny5s0q6/DVxxD1ERwP24mM9tIO4tfEZ4k+qzP+4vSxrqGTEhzXicpTTXsJVtuYM/l7WYFYcq2YopV1mWsrsxltIAbX7M7/0twYeqnYYjMi1/3g7UzHxCrObsge9AG77KZ1XNJzCXXks3vjAxvG+G4PeX4FXpEmyXLkG7dAl2Spdgt3QJOqVLsFe6BM302fQyZWhWQIbyPWOzAr6xWQHv2Mz1jy3KuWzDXLUp6us8i0y/5OqoK9Pes9luoxL2i1JUwYYb9F2tashRBVtGOapgzw36lkY15Cg/8mE5yo9/WI7yoyCWoxqxUDXioUZFYqJGReKiRkVio0ZF4qNGQYz0EDlsuJmVMHPmLHFWZUn5tJ+7srQrYmUpVpZiZSlWlpgqVpZiZSlWlmJlKVaWfmxcXP4dQvn3BuVnWcrPr5SfWSk/p1KNbEr5MlQhi1K+Z6xG9qR877hOZakts5F7IG9VKktapvaz2W6sLMXKUqwsxcpSrCw9Vca+Gv60GjFRrCzFytJ9KkvKl/zclSX1rFOsLMXKUqwsxcpSrCzFylKsLMXKUqws/di4uPw7hPLvDcrPspSfXyk/s1J+TqUa2ZTyZahCFqV8z1iN7En53nG9ytIOxRuvnkmm9SpLSqbms9lurCzFylKsLMXKUqwsPVXGvhr+tBoxUawsxcqSX1kyr5feGHxIXlVpICVAfPh8qDBFQnODObcbgtsjD+OwN8wVYc3lzHov4LbsKYy155jfcGnm2XxsQjnIsVXV0HnMEM5swXk6u06QhcVzlbjKwSc6Xgg+uU290TOMdVu9FuYpba1gW5tGczjwKhYhTEL6iyeSTqjuwu8ANSuC2Xjd1yFo3SDYl4nhk5z6VA/DypCpvQ250ll43Zebj/XhCZ3SxBqj3kNZo1Uakv6hnqvdBPVwaPBSdbRr+szneKmV1hA+m+TGmiV1nUAPmHtcpDh9jVe+bOtJ3Nw4iVsbJ/GrjZN425G4RU8BZEut8VmSa4rHSq99o++LwjjTG2kf7Poh0ztraOhUUrdl+ORSjX8vxtKHuGe0+BRmO398YZw9Ptyd5nIHckeocW4bnLWxMN8jnY0322LkoN56nTU6k8Zs+5b2UD7z0txhwxS8L7Nuhd5M7WLdGOfxcUr4/Tn3i1jMN+jGuCXGLdWMW1pU28339+1Cf6/OBI4xTIxhYgwTY5gYw1Qvhtl7QAzzsKyL2tN+bPQSY5efPXb5MTsTZsZ3Y8yyURJvfsxStsQxTolxSplxivuN8ofkWtQJYTHXEuOVKsYrMdcS45aXFrfEXEuMYWIME34DTcy1xNjlpcQuMdcSY5aXErOULXGMU2KcUm6cYn/H6iG5ls6DopWYa4nxSsy1xLglxi0x1xJjmBjDPEUM03xADBNzLTF2qW7sEnMtMWZ5KTFL2RLHOCXGKQ+NU2wIf//5QpgxQwgWimjUOyEe/k0hG/6O9hmNV2+M8Tm4lBzd4B6l3tNrvpdV52eyKBJa1280whXZnE3pv280nxrfOXhDEQNLOJGW4fLJojLbq3cSuhoSpkANGVAfI/HJGKF6x08IZ7cJ0do0vRy+vQy+vSAt68A47cUcnwnXVL0Mqp6ctQnZ2TQw9jDObtMF+KWBbXgtXQpXhz+BJ0SrxRUJR+Q2hdm/P69hnN2mSGafwpW5Vyhzz5HZxuo3ufot9RsmUaI5eZ5uqgHmOH2c289BSvUNtOgPgNxl9BqmNCP3EN6N4cM87Aj6Qu61rmVnUbljCnnxEAW+ExRnkve9pYwFVun4s7CIsz350/n2/CcTo2+Pvj369ujbo2/fRN+eXwmNvj369ujbo2+Pvv15fbsL40zNKXiSa/JjTGNfn4KMd0LnevQ1Xr2heftO7/C7TnsJQVXPIdghzfdYjlhfvQc+MzGXHvcuuBupN2l+pyznUHykvufe7Pr4RNZn9LlMnE/dB54t+N2i74lzBop1GTWhQ3cuW0DRkdWeOznqEDczI4o82tBuC1o206qD4qvOelDczJZoF5hxs7OLLhTpZrD+fG7RKcFxjV0Pm02VxcH1vtlUmJucwagOBVahfqP1++6tRZgmIT9zCb5jRtlFs7r0Ua4e20kWFe7naMMTsj5ciQui7VJOfJDOVDONL9ajz+KrK1oT2slXOTztc5myddy2AG6rMqr42X5XZQiW1ZptTa1XyJawFlSDq0vxl/Qk6iy1JmHmaYul4PNfbArsR50zZY9rTBqSd77UCqTTZ1TtgeW9AkvZpco2r/qtdYbVJ2lvIWvVuOIzpr6lEvIOpObJ1dlsusSasVPHKluy1zwaxOfNG1IcUVV3In0MtjszOGrtukytE7mu2yqhfY8jB32Cy0B+QlmQW4gGNeucrHCRjudfMBbWTheTUDwypRlAyAp0lWVkax5K7V2Q/XHPB2IleVxS7KtmZD3KmuwR4/UbkllruvYdHInrUwN3Ca/WOAv7ieKDRVrXUXVrxrEOrjJp1Iqr8/y4RnRFq25T1h893mbBeHdyx7tTynjrT6grK4o61P3jfrqPuDDXS9pQO06pavSyQ88wb8HfbXkfrqMMPFOKfWVT3h/fJ3pBvh1ojc8Zxfglxi+bE7+0CuMX9RzM08Qv/DTelnwmLz9+8e21SvHL5kYvpk6vG7kMBT4DcQWeYkyfb+DzF/g/rHg8w2/qz9vhO7k7fOcF7vBT+Vn39tgdX8PNLIZ+wob3ghvQSpZJX9foiS7+7N4xzWkOBtDTBfziLuWfTelTJGRnfemjdUbMhSaWzWDfNfAFyB13Qx0bqjnSMhePoFWREbRyRlDPWZ+6k82qe9muDwDp0qdjMTbsynyiOu/MyivZyvUa9UwMzthhmkXE+0fOnttQRcd2YtdMbAxCsmVnvB6lmoEZ2RZDToAWs/dsc3xehqZRM3kghrRLjMRnqk3MqP3/AUeAlD0=
It's a Zlib-compressed string with a 32-byte uncompressed header.
var encoding = #"";
byte[] data = Convert.FromBase64String(encoding);
var compressedArray = new byte[data.Length - 32];
Array.Copy(data, 32, compressedArray, 0, data.Length - 32);
var decompressed = ZlibStream.UncompressBuffer(compressedArray);
var str = Encoding.Unicode.GetString(decompressed);
The header contains the uncompressed length of the data in little-endian order at offset 8: a5 d5 or 0xd5a4, which equals 54692. It is hard to tell from this example if the uncompressed length is stored as a5 d5, a5 d5 00 00, or a5 d5 00 00 00 00 00 00.
I'm trying to write a legacy binary file format in Python 2.7 (the file will be read by a C program).
Is there a way to output the hex representation of integers to a file? I suspect I'll have to roll my own (not least because I don't think Python has the concept of short int, int and long int), but just in case I thought I'd ask. If I have a list:
[0x20, 0x3AB, 0xFFFF]
Is there an easy way to write that to a file so a hex editor would show the file contents as:
20 00 AB 03 FF FF
(note the endianness)?
Since you have some specific formatting needs, I think that using hex is out - you don't need the prefix. We use format instead.
data = [0x20, 0x3AB, 0xFFFF]
def split_digit(n):
""" Bitmasks out the first and second bytes of a <=32 bit number.
Consider checking if isinstance(n, long) and throwing an error.
"""
return (0x00ff & n, (0xff00 & n) >> 8)
[hex(x) + ' ' + hex(y) for x, y in [split_digit(d) for d in data]]
# ['0x20 0x0', '0xab 0x3', '0xff 0xff']
with open('myFile.bin', 'wb') as fh:
for datum in data:
little, big = split_digit(datum)
fh.write(format(little, '02x'))
fh.write(format(big, '02x'))
...or something like that? You'll need to change the formatting a bit, I bet.
This question already has answers here:
Using Regex to generate Strings rather than match them
(12 answers)
Closed 3 years ago.
I would like to know if there is software that, given a regex and of course some other constraints like length, produces random text that always matches the given regex.
Thanks
Yes, software that can generate a random match to a regex:
Exrex, Python
Pxeger, Javascript
regex-genex, Haskell
Xeger, Java
Xeger, Python
Generex, Java
rxrdg, C#
String::Random, Perl
regldg, C
paggern, PHP
ReverseRegex, PHP
randexp.js, Javascript
EGRET, Python/C++
MutRex, Java
Fare, C#
rstr, Python
randexp, Ruby
goregen, Go
bfgex, Java
regexgen, Javascript
strgen, Python
random-string, Java
regexp-unfolder, Clojure
string-random, Haskell
rxrdg, C#
Regexp::Genex, Perl
StringGenerator, Python
strrand, Go
regen, Go
Rex, C#
regexp-examples, Ruby
genex.js, JavaScript
genex, Go
Xeger is capable of doing it:
String regex = "[ab]{4,6}c";
Xeger generator = new Xeger(regex);
String result = generator.generate();
assert result.matches(regex);
All regular expressions can be expressed as context free grammars. And there is a nice algorithm already worked out for producing random sentences, from any CFG, of a given length. So upconvert the regex to a cfg, apply the algorithm, and wham, you're done.
If you want a Javascript solution, try randexp.js.
Check out the RandExp Ruby gem. It does what you want, though only in a limited fashion. (It won't work with every possible regexp, only regexps which meet some restrictions.)
Too late but it could help newcomer , here is a useful java library that provide many features for using regex to generate String (random generation ,generate String based on it's index, generate all String..) check it out here .
Example :
Generex generex = new Generex("[0-3]([a-c]|[e-g]{1,2})");
// generate the second String in lexicographical order that match the given Regex.
String secondString = generex.getMatchedString(2);
System.out.println(secondString);// it print '0b'
// Generate all String that matches the given Regex.
List<String> matchedStrs = generex.getAllMatchedStrings();
// Using Generex iterator
Iterator iterator = generex.iterator();
while (iterator.hasNext()) {
System.out.print(iterator.next() + " ");
}
// it print 0a 0b 0c 0e 0ee 0e 0e 0f 0fe 0f 0f 0g 0ge 0g 0g 1a 1b 1c 1e
// 1ee 1e 1e 1f 1fe 1f 1f 1g 1ge 1g 1g 2a 2b 2c 2e 2ee 2e 2e 2f 2fe 2f 2f 2g
// 2ge 2g 2g 3a 3b 3c 3e 3ee 3e 3e 3f 3fe 3f 3f 3g 3ge 3g 3g 1ee
// Generate random String
String randomStr = generex.random();
System.out.println(randomStr);// a random value from the previous String list
We did something similar in Python not too long ago for a RegEx game that we wrote. We had the constraint that the regex had to be randomly generated, and the selected words had to be real words. You can download the completed game EXE here, and the Python source code here.
Here is a snippet:
def generate_problem(level):
keep_trying = True
while(keep_trying):
regex = gen_regex(level)
# print 'regex = ' + regex
counter = 0
match = 0
notmatch = 0
goodwords = []
badwords = []
num_words = 2 + level * 3
if num_words > 18:
num_words = 18
max_word_length = level + 4
while (counter < 10000) and ((match < num_words) or (notmatch < num_words)):
counter += 1
rand_word = words[random.randint(0,max_word)]
if len(rand_word) > max_word_length:
continue
mo = re.search(regex, rand_word)
if mo:
match += 1
if len(goodwords) < num_words:
goodwords.append(rand_word)
else:
notmatch += 1
if len(badwords) < num_words:
badwords.append(rand_word)
if counter < 10000:
new_prob = problem.problem()
new_prob.title = 'Level ' + str(level)
new_prob.explanation = 'This is a level %d puzzle. ' % level
new_prob.goodwords = goodwords
new_prob.badwords = badwords
new_prob.regex = regex
keep_trying = False
return new_prob
Instead of starting from a regexp, you should be looking into writing a small context free grammer, this will allow you to easily generate such random text. Unfortunately, I know of no tool which will do it directly for you, so you would need to do a bit of code yourself to actually generate the text. If you have not worked with grammers before, I suggest you read a bit about bnf format and "compiler compilers" before proceeding...
I'm not aware of any, although it should be possible. The usual approach is to write a grammar instead of a regular expression, and then create functions for each non-terminal that randomly decide which production to expand. If you could post a description of the kinds of strings that you want to generate, and what language you are using, we may be able to get you started.